Skip to content

Make partition_key provenance-only and inherit it onto asset events#67718

Open
Lee-W wants to merge 1 commit into
apache:mainfrom
astronomer:partition-key-semantics
Open

Make partition_key provenance-only and inherit it onto asset events#67718
Lee-W wants to merge 1 commit into
apache:mainfrom
astronomer:partition-key-semantics

Conversation

@Lee-W
Copy link
Copy Markdown
Member

@Lee-W Lee-W commented May 29, 2026

Why

partition_key semantics for DagRun and AssetEvent had three rough edges, all on unreleased AIP-76 behaviour (first ships in 3.2.0):

  • A task emitting outlet events could overwrite DagRun.partition_key via a runtime back-fill. That blurred ownership of the run's key — it should be provenance set by the scheduler / trigger side, not rewritten by task emission.
  • An outlet event emitted without an explicit key landed with partition_key = None, which left downstream partitioned consumers unable to route off it — even though the run it came from carried a key.
  • The REST trigger endpoint accepted a manual partition_key for Dags that are not partitioned at all, silently ignoring it.

closes: #67368

What

  • Drop the runtime back-fill in register_asset_changes_in_db. A task emitting outlet events no longer writes DagRun.partition_key; the run's key stays provenance-only, set by the scheduler / trigger side.
  • An AssetEvent emitted without an explicit partition_key now inherits the run's partition_key instead of staying None, so downstream partitioned consumers can route off it.
  • The REST trigger endpoint now rejects a partition_key for non-partitioned Dags with a 400 (Dags that are neither timetable.partitioned nor timetable.partitioned_at_runtime). The validation runs inside the route's error-handling block so the domain ValueError surfaces as 400 rather than 500.

All behaviour is unreleased, so the changes land in place with no compatibility shim and no newsfragment.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: [Claude] following the guidelines


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

…vent

- Drop runtime back-fill so task-emitted keys no longer overwrite
  DagRun.partition_key; the run's key stays provenance-only.
- AssetEvent.partition_key now inherits the run's key when a task emits
  an event without an explicit key, instead of staying None (which left
  downstream partitioned consumers unable to trigger).
- Reject a manual partition_key on the REST trigger endpoint for
  non-partitioned Dags with a 400.

All behaviour is unreleased (AIP-76 first ships in 3.2.0), so changes
land in place with no compat shim and no newsfragment.

closes: apache#67368
@Lee-W Lee-W force-pushed the partition-key-semantics branch from 87119dd to dd8e7c4 Compare May 29, 2026 14:36
@Lee-W Lee-W requested a review from uranusjr as a code owner May 29, 2026 14:36
and not dag.timetable.partitioned
and not dag.timetable.partitioned_at_runtime
):
raise ValueError(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rejection only runs on the public REST trigger path. The Execution API route (trigger_dag_run in execution_api/routes/dag_runs.py:144) passes partition_key straight into trigger_dag then create_dagrun with no partitionability check, so a non-partitioned run can still be created with a key there, and with the new inheritance it then emits partitioned-looking AssetEvents. TriggerDagRunOperator doesn't expose partition_key today so it isn't reachable through the shipped operator, but the wire field is forwarded ungated. Putting the check in create_dagrun/DagRun.__init__ would cover both entrypoints instead of just the request body.

Copy link
Copy Markdown
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. The one inline note (the partition_key rejection lives in the REST request body, so the Execution API trigger path at dag_runs.py:144 still forwards a key into create_dagrun ungated) is an optional suggestion, not a blocker, since TriggerDagRunOperator doesn't expose partition_key today. Worth considering folding the check into create_dagrun so both entrypoints share one gate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clarify partition_key semantics across DagRun / AssetEvent / runtime

2 participants